IKXriniBSf BESOHE 



ED 0^7 958 



PS 007 424 



&07HOR 
TITLE 

FOB D&TS 
HOTE 



EDRS PHICS 
OESCSIPTOBS 



ZloiIes# Herbert 

A Radical and Regressive Solution to the Probles o£ 
Evalaation* 
Jan 73 

12p«; Paper presented at the Heeting of the Binnesota 
Round Table in Early Childhood Edncation (Vayzata, 
Minnesota, June 1973) 

HP-$0. 75 HC-$1. 50 PLUS POSTAGE 

Affective Tests; *$:iassrooB Environnents Classroom 
Observation Techniqness Classrooa Research; Cognitive 
Processes; Conceptual scheoes; •Earl; Childhood 
Educations Educational Objectives; ^Evaluation 
Sethods; *Evaluation Heeds; Models; ^Preschool 
Programs; Socixii Maturity 

ABSTRACT 

This paper reviews tvo major advances in preschool 
evaluation strategy that developed as a result of trying to evaluate 
Head start, and proposes another evaluation approach. The first 
advance in evaluation procedure was to conceive educational 
objectives in terms of processes rather than products; that is# there 
vas a shift from achievement tests to tests of cognitive process 
based on Piagetian problem-solving tasks* The second evaluation 
advance vas to recognize the importance of comprehensiveness by 
extending evaluation content to include affective and social as well 
as cognitive processes* The alternative plan proposed in this report 
entails systematic and comprehensive evaluation of the child's school 
environment, to be folloved by a theoretical analysis of the 
potential impact of his school experience. This approach represents a 
shift in emphasis from the assessment of impact on children to the 
assessment of the antecedent condition, the classroom environment. To 
implement such an approach to the evaluation of early childhood 
education programs, there is a need to explicitly formulate 
propositions regarding how and why preschool programs should work. On 
the basis of such a framework, methods oust be devised for moving 
into a classroom and reliably describing, in quantitative terms 
wherever possible, the salient dimensions of its environment and its 
interactions. (CS) 



A Radical and Regrecsivc Solution to the Problem of Evaluation* 

TO 

^ Herbert 2imilos 

ry ' ' Bank Street College of Education 



r A 



Tbe irony of the title of t^s presentation stems from roy observation that 
the more our current efforts to evaluate educational programs strive for role- 
vance, the more invalid they beccmc. Having reluctantly cosic to this conclusioai, 
I propose that vre radically change our raothodological frame*w>rk for ©valuation. 

m 

Lot us examine tlie case for this proposal. 

, When Project Bead Start was instituted, thereby vastly expanding preschool 

education f it v?as accompanied by a laandate to evaluate its effectiveness. The 
implication was that thuj program would stsmd or fall by this evaluation. 

The evaluation of Head Start seemed precisely the situation which required 
the Tcind of caaprchensive evaluation we at Bank Street College had been advocat- 
ing and had begun to put into practice One of the guiding principles of our 
*/orlc has boc-'n the conception of schools as psychological fields, as environments 
fc^ich significantly influence children's psychological development — cognitive, 
af loctivo and social — rather tlum as taoxo training grounds for academic skills. 
Our bookf The Psychological Impact of School Experience (Minuchin, Bibor, Shapiro, 
and 2;ir;i3oc, 1950) reports tha renults of an efforL to iiaploracnt and test this 
point of vxcvT by systematic and empirical evaluation. -Tho research was an in- 
tcnfiivG study of nino-ycar-old children who wore attending very different kinds 
or schools, 'fo c-xamir.od the v;av in which thcne diflcrcnt oducrtional cjqxirienccs 

^ hiiC affected the children's self -awareness, intorporsonal skills, praMon-nolving 

patterns, group behavior, and other aspects of psychological functioning which 
relate to htrnan developnent. 

*7-.ar.ptod rrcca n paper PiCiiontod at the Ilinnocota Rouni Tablo in Early ChilCthocd 
Utlucr-.tio:!,- ;;cyrata, nijinccol.a, June e-9, 1973. 



ERIC 



^BoQ evaluation o£ llead Starts homver, took a quite different* and sore 
traditional turn. The first evaluation studies wore conducted by psychcaaetri- 
cians whose main concern was for -the es^riiaentaX design of the study. Few of 
the existing instruraents had boca standavdlced for use with young children, and 
since a quantitative cvaiuatioa reqaires a stand«u:di2ed test, the Stanford-Binet 
was atoost autcnatically selected as the instrument to used to evaluate the 
effectiveness of Head Start. Mucli more attention was given to problems of 
sampling, the designati<»i of prq?er control groups, and appropriate methods of 
statistical analysis of tl» data. {Nevertheless, questions inevitably arose re- 
garding tho relevance of Stanfosd-Binet items for an evaluation of the impact of 
preschool education and the scarcH was on for intellectual measures whose content 
was closer to the teaching and learning which actually went on in preschool and 
which more accurately reflected tho cultural values of the population under 
stu^. As a result, the priorities of standardisation and quantification in the 
evaluation instruments were lowered and the critericm of ccttitent relevance was 
raised to a more central positi^. 

The concept of relevance gradually broadened, and became increasingly 
sophisticated. Other measures of int:ellectual aptitude or achievement were 
added, Tiicn a more significant change occurred. Largely under the impetus of 
the Piagetian rebirth, many investigators began to emiAasize that preschool 
should be fostering the ability to think and function effectively on problem- 
solving trtsks. Tho arg\»nent emphosiaed that preschools, especially those attempt- 
ing to provide caapensatory educat:iox>, should bo less concerned with training 
children to achieve specific skills or to learn specific academic content and 
more concerned with fostering cognitive growth— now that Piagct and Bnmcr and 
others had helped clarify what we meant by cognitive growth. Accordingly, eval- 
uators vrcro acVnonishod to revise their ap«iO«;£^.ent procodnros ctill further and 



£ocus on measuxos of cognitive process as well as cognitive achicvesient. 

Each a4jts8tmont which dofinod criteria in greater breadth seemed to rep^^ssent 
inportant progress; it meant that evaluators were l}eginning to see the fallibil- 
ity o£ their simplistic criteria and that educators of- young children were cd&ing 
to grips with the fact that they were not merely concent with training chiliSren 
to loam specific tasks. Program innovations such as the introduction of a 
"Piagetian curriculum" virtually dictated that evaluation criteria be defined in 
terms of cognitive process variables. 

The ftmrt: move forward, not surprisingly, was to extend the definition of 
educational objectivus and evaluation criteria beyond the cognitive realm, trho 
fact that many psychologists found this new dceaain an alien one is revealed by 
the reference to it as "non-cognitive." Thus, although the social and affective 
criteria were defined by exclusion they were, at least, beginning to be regarded 
as essential elesiento in a cosnprchonsive evaluation battery. 

KcM, after loss than a decade of intensive efforts to evaluate Head Start 
and the now programs in open education, two major advances have occurred; 
(1) educational objectives are being defined in terms of developnental processes 
rather than discrote products? and (2) the content of evaluation studies has been 
extended to include affective &nd social as well as cognitive processes. 

f^ile this afiuiaing progress is to be aH?3aiuded, one wonders how much advance 
in educational evaluation has actually l>een made. My am reservations are based 
on several considerations. Perhaps the most devious concern is that when we 
exatnine the arrcy of noasures radiating fron IQ and achiovcwcnt tests to tests 
of cognitive processes and then to tests of social functioning and personaliti', 
we find a concotnitant decline in validity. In attempting to neasurc cognitive 
procccson rather tJian products, our use of lu-oblcn-solving situations as opposed 
to convr«ntional tort fta'n?; Icadn to a nrr^tc^ roduction in the cnount of coonitive 



behavior sssjapled, bocauso it takos much more tino to assess pn^lcm-solv Lng be- 
havior- vmile problco-solving tasks, on the surface, seem amenable to extensive 
analysis of qualitative features of porfomumce, in reality, only a small number 
of behavioral characteristics can be categorised reliably. Vhss net effect of 
introducing such new nethods oi jissessiL-snt is to reduce tho variability of 
scores vhich adversely affects both reliability and validity of jaeasurcment. 
Thus, jprciblenv-solving techniques have liiaited potential for yielding highly 
differentiating quantitative data, as ccmparcd with tho wide range of scores and 
the high reliability of crultiple-itemod intellectual aptitude tests which sawplo 
many donaiair, Personality lae^rures are, of course, even less useful; at J>est, 
they have a degree of construe- validity which cannot be understood in quantita- 
tive terms. It is hard to conceive of a single personality test v;hich has the 
psychctaotric credentials to serve as a criterion measure in an educational eval- 
uation. * 

Another disappointing note is that an increase in the broaath of assescsient 
has not aiv;ays been accanpanied by a shift from product to process orientation, 
li^iilc conservation and other Piagctian cognitive attributes are replacing the 
learning of the alphabet in so-called innovative programs, such prograias still 
sccr.i just as concerned with training as those of tho past. Corjscrvation skills 
have merely replaced more traditional content in what remains a very traditional 
fom of education. If cMldrcn are to bo drilled and trained, perhaps it would 
bo better to train then in soncjthing that seoincd useful to thcu, sor.i<;tlui.g v;hich 
has faro validity. Piaget uses tho conservation paradigm, rr^ong oth.-rc, to illus- 
trate a laodo and level of cognitive functioning. VThothor or not a child conserves 
nu.i3bor may be quite revealing about his level of cognitive development, but it is 
riot nt pll clear that a child v;ho is trained to conserve is very difEorcnt Traa 
ono who ha::: not been so trained. 



I£ the recent reform in evaluatimi mothodology ban been distorted by tnany 
of thoso who haves adopted e nagctian appx^achr even greater errors of judgment 
have been cocniaittod in the naioe of personality assessment. X have received 
urgent phono calls asking for a good personality measure to be included in an 
evaluation battery in the same \/ay that distributors arc phcmod by storekeepers 
regarding a new line of items they want included on their shelves. Vim fact 
that personality neasvtren^t remains one of the great unsolved probloas of more 
than 50 years of research activity se^it^ not to have pehotratod those nev coh" 
verts who have suddenly recognized tho value of canprehcnsive, dovolopaifintal 
approaches to education. Their indiscriminate enthusiasm is not accompanied by 
an apprecdLation of th:> conceptual and methodological complexities involved in 
working with personality data. There iSf therefore, every reason to bo pescimiK- 
tic about prospects for devising personality measures good enough to be used in 
large-scale evaluation studies. I have begun to believe that we have made an 
error in not taking Gordon Allport's (193?) call for idiographic measurement of 
personality more seriously. One of the problc.-ns with personality measurement 
is that different traits are differentially salient for different children (or 
adults) . Across-the-board measurement of a particular trait generates a hodge- 
podge of data. Tlic data gathered from thoso for whom tlio trait is salient may 
be quite telling, but ;he data obtained from the rest of the sample may have 
little or no functional significance. 

iwring the days when we were struggling with tho problem of evaluating Head 
Start, V70 were thwarted in our efforts to get Head Start teachers to tell us what 
their main objectives wore and to describe haia they proposed to reach them. The 
lack of rordinoss of educators to contribute to a substantial formulation of 
cdvcctioncil nothods end gtaJs lies hiutiixired evaluation atudics. Finally, at tho 
end of tho r.chool year, wo turned to corx<i articulrtr tonchors in the Unr.ly 



Childhood Coxxtoi; which Bank Street College was then qperating in a poverty area 
aaxSL asked thcsi to cun dotm the list o£ children in thoir class r indicating for 
each child the areas o£ greatest grom^ during the preschool year. In alnost 
every instance, these teachers siagled out for ccnsidorati<m a facet of the 
diild*s personality or social behavior which had dcrainated his functioning iji 
school and which had undergone change in response to their method of working 
with the child* But the attributes and context varied for eadi child* Inhere 
was no question in the minds of the teachers vAio protrided these data r^rardixtg 
the central role played by personality factors in the school lives of these young 
childrcnr but it would have been impossible to captxirc the points they were mak- 
ing through the systematic application of a particular personality scale or in- 
ventory* Bach child manifested a distinctive configuration of personality and 
social characteristics. 

Another problem, well known to everyone but just as widely ignored, which 
bedevils tliose who seek a more relevant and coniprchcnsive evaluation of school 
programs is the fact that a good deal of educational intervention is expected to 
have future rather than ifnmediate impact* Yet evaluation research is so domin- 
ated by a mechanistic, push-pull outloolt that we have learned to pretend that 
whatever findings sha./ up icKiediatoly constitute the ecsut-ntinl inpact of an edu- 
cational program. Such a per^>ective invites a narrw and superficial approach 
to education. 

For all those reasons, none of then nc;/, I cannot celebrate the long overdue 
move toward more relcvvant and more comprehensive evaluation* I have indicated 
that there are limits to the degree to which such goals can be attained and have 
observed tliat sciae of the notions of relevance- and campr'^honsSvencss have been 
misunderstood and distorted, thereby threatening to discredit the approach nr, a 
v;hole* I h^vc also noLcd that conprehencivc ovaluotion is sevofoly limitod unices 



we are willing to ascesc the loncj-tcm impact of educati<^l progsans. 

This vosy pcsslraictic analysis doos not imply that thq Cifforts described 
should bo discont^'nuod* We will not solve these io^rtant problems unless we 
continue to work at them. I can think of no more dmllenging research for a 
dcvelogsnontr.l prychologist than that of atten\pfcing to analyse the events of a 
preschool clas^-'xaa in toaas of their potential influence on the participating 
children, ana ^cn to devise an assessment of the children's characteristics 
which are hypothesised as beis^ inf luet^ed. Howoverr mich work cannot and should 
not carry tJie label— or the burden — of evaluation because its findings, by defi- 
nition, lack the infallibility and <&jfinitivcness we auteei^atically associate with 
evaluation. IThon negative results are obtained thoy arc much more likely to re- 
flect the methodological weaknesses of the study than the failure of the educa- 
tional progrcra- The peor>le worJdiig on suc% studies should not be constrained by 
the d<»sign rcquireraonts of evaluation, nor should they be jroquired to carry the 
pcycholofiiccil and political burden of determining whether a program will stand 
or fall on the basis of a clearly inadequate study. Without the pressures of 
serving as axi cvaluator# researchers are li};ely to be less defensive, and more 
critical of their work end therefore freer to change and improve it. 

If the evaluation ol the impact of educational programs on children in to 
be discontinued because such evaluations are either too incoraplote or,^ when they 
strive for car.prehonsivoncr.s , i.-.valid, then ha-; shall programs be evaluated? 
lliu alternative plaJi hero proposed siLiply entails systematic and connrehcnsivo 
evaluation of. thf ch5 3d'n nrycholof/ical school onviroraicntr to be lollo-v;od )y/ a 
theoretical analysis of the potential impact of his school e:<t>crience . This would 
entail a shift in caphacis from the assessment of innact on children to the 
asr-ct;cnent of the antccodcnt condition, tho clacnrocni onviromaent. 2vc:i those 
evaluijtion procedures vrfiich follow the currant sr.odo of focuring on the imjjact of 



■ - s - ■ . , " 

tlie program on the childeon are incceasincily calling tox a <3otaiXcd description 
o£ the school env iroreaont. Their interest is primarily in more clearly defining 
the indcjLK^ndent variable o£ an evaluation stu<fy. Ilany evaluation studies have 
rtsported outcome data on participating Children without knowing with any degree 
of ccrtairity or detail what the nature of the program was whose inpact was being 
documented. Indeed^ scoe evaluatcrs »ake a virtue of sucdi ignorance by claiming 
that they are unbiased by any prior e^^osure to the program vhose inpact they 
assess. During one of our evaluation studies of Pto^QCt Head Start, we cbserved 
that many c£ the children whom we had extensively tested had hardly attended the 
Ifead Start program whose impact we were struggling to measure* It is equally 
absurd to assess the impact of a prpgr;«i without cotisidering what actually wont 
on in the progrjon. Yet, most evaluators select tho?.r assessment instruments 

without firsthand kno;;lodgo of the program's way of oporating. Apparently, eval- 

.. . . ^ . 

uators view their task as a fishing expedition in strange waters; they cast the 
beet xujts available and hope for a good catch. The way in which the dependent 
variables which are being measured by the evaluation instnanents are described 
mattes it fecm as though the measures have been chosen on the basis of a theoreti- 
cal e'valysis of the actual educational phenomena to bo evaluated, but in reality 
the measures are selected on tlie basis of convenience, availability, and a super- 
ficial judgment of relevance, hs matters new stand, when one preschool program 
is reported xus having "scored higher" in evaluation than another, ray main conclu- 
sion is that the content of the ariaitrarily chosen evaluation criteria more 
closely matched the transactions which too!: place in one program than the other. 

Our inability to measure the impact of a program precisely or comprehensivo- 
ly is untlor£:tand?i>lo in the light of existing methodological limitations, but 
thoDQ linicationr; do not apply to the tad: ol conceptualizing and describing 
the progron itccir. Those who initiate an:l o'X'jrnt.o r prognsn should be ablo to 

ERIC 



describe vrliat they are doitig end what they are trying to acccia|>li^. The task 
of describing and recording classrooni interaction is of a very different order 
Of tnaut^itudo from tlwt of attonpting to moasture how a child* o psychic organisa- 
tion and functionlnc| has been affected fay expericmcisg such an envircmcnt. It 
i& a parndCK that wc have the responsibility and the capaci^ to describe and 
record tho essential character of an educaticmal program^ yet do not do soi and 
at the same timof wo do not knw how to assess the in^ct of a cottplex set of 
cKpericnccs on the t^chological functioning of a ctevelppijig diildf yet ve per- 
sint in trying to do so» 

But v;hcre ore wc in our evaluaticm if wo sisoply document; the nature of the 
procjrrjni ns it occur q but are unready to assess its iinpact on the participating 
diildrcn? v;o must carry our analysis of the prograxn one stop further* Just as 
it is tlio obligation of a program initiator artd director to describe the nature 
of his prograwr so is it his responsibility to justify its usefttlness on the 
basic ol BixciQ specified conccptiual £rejncwork« Any net of actions directed 
tacard care and dcvelox^ent of children is based upon an explicit or implicit 
set of propositions regarding the consequences of the proposed activities. 
Without a rational basis for its operation, a program duos not deserve to be 
implemented « 

Most educators operate on a largely intuitive levels Their conceptual 
francwork is more i^niplicit than explicit. The foriA of evaluation I am advocat-- 
ing roQuircs that this franowork becaae explicit. One of the greatest obstacles 
to progrcac in early childhood education ic tliat fornulation of the nature oI 
the young child and his develox^icnt is inccaplete as is a conceptual schenie for 
edttcatiD2/il programing in relation to our underctanding of the child^ If such 
an articvlutcfl theoretical fravcwork citictcd, both in relation to tha child and 
to an cu-icationnX prcAjrani for hiri, it should be pDf^sible to arrive at a set of 



- 10 - 

procadlures £ox: describing and recording educational exwiraasaents .ind for analyz- 
ing such environoients in tozsas of their potential in^ct on the participating 
children* Thus^ we need a system that codifies cbservations of the adult models 
to which a child is exposed in sc^iool, the ezaotional climate of tixo cl^ssrociar 
the nature of the activities be es^riencesr the kinds of stinulation ho receives* 
the values transmitted, and other related f&cets of the school environment that 
are likely to affect his dovolp]?snont. In s!^ view, this is the essence of educa- 
tional evaluation and until we beooene better able to assess t)^ impact of pro- 
• grans on children, our priiaary method of evaluating early childhood education 

programs should be to describe in groat detail what they consist of and how they 
©ixsratc, and then hypotliosiso, on the basis ol our thoorotical framework, \vm a 
given prograjii will affect children* While such a speculative approach to evalua*- 
tion may lack ttic apparent advantages of current, preferred, empirical methods 
for validatiitg a program, we are deluding ourselves, wasting tlirie and effort # 
micintetprctinj data and thereby subverting cchjcational planning, by continuing 
to ignore the glaring deficiencies of empirical methods of evaluating education- 
al lmx>act aT»d nogloctinci those activities of observation and theoretical analy- 
sis which are needed to shore up mr ccmcex>tual framet'ork for program planning. 
Wo need to obior\re children and programs much more than v/o do «tnd we noed to 
deal actively with the obligation to articulate and elaborate our conceptual 
francwork. One of the reasons why assessment of impact has not progressed is 
tlie poverty of our thin};ing about children and programs. The more articulate v/e 
beccno about children and programs, the sharper and more oCfoctive v;ill bo our 
thinking about tlie assessment of impact. As already emphasised, I am not sug- 
gesting that efforts to a&ucss impact should cease i on the contrary, tJ^cy should 
orri^r-nd, but not under the aegis of cvaluntio::. 

t?h5.1o tlio procedural chc«ngos I cm roccnuncnding nay seen rodical, tlioy are 

O 

ERIC 



not ixt all now, but simply doscriho hex/ wo nw function most of the tiaie. The 
proposition is that thcec procodut-cs becoao codified. Most institutions and 
activitico aro evaluated in the faciiion here rocOTBasnded. Vic have very little 
systematic r o:.T?ori«cn tally controlled data regarding the efficacy of any of our 
»ost i»port»\nt activities or inisti tut ions. Wo do not know if going to a museum 
or library Oa* concert really makes a dif feronco nor do vo liave sound evidence 
regarding the value of taJcing a trip to Europe^ yet \re ungrudgingly spend large 
suTos of money on such ventures. If ve are selecting a catt^p for our child, we 
do not ask for data informing us aJ^out the average sliming speed ixapro/ejsaent, 
nor would we he very nuch influenced by such c?ata x-^ere it available. In our 
evaluation of tlio carip or the trip or the cniseunt, we syctcnatically exasaine the 
environnent and analyse its potential for pro;lucing certain (usually multiple} 
desired consonuenccs, and make our decision accordingly. As a matter of fact, 
I suspect that most of us, were we ralecting a preschool for our children, would 
not placo r.uch rtock in existing c:r.pirical vcJliCdty data no natter hoi/ cctaplctc, 
but would instead base oxir evalitation on a visit to the school. Of course. It 
would bo good if v;e could obtain sound, cjuantitative data regarding the value of 
all of the abovc-«aontioned activities, but until such data are forthcoming, wo 
would be vice to chi»rpcn o".r raothcxlc foe loo?:ing at end describing those insti- 
tutions and developing our conceptual framov/ork regarding hoi7 they function to 
produce particular outccncs. 

To ixiplc:;»cnt such an approacii to tlio evaluation of early childhood education 
prograns, vo rood to organise and elaborate our ideas and kna.;lodao of young chil- 
dren, and formulate explicitly our propositions regarding how and v;hy preschool 
programs sliould work. Given such a franiowork, v?e can aovo to the classroom for 
a reliable description, in ounntiLativr tomr. wlvtrovor possible, of the £;:.licnt 
dir^onnionr. v.'hirh constitute- its environment and it-, interactions. 



^ 12.- , 

We need to adopt this cypproach, not only because it vill ioprove our 
lOothodG of evalviction, bt\t because of the impact it would have on current train- 
ing ajxd plamur.iT in early childhood education. It will foster an imago of the 
classrocsii as a field, consisting of multiple interactions and dynamics which have 
a great variet ' < f consequences. Evaluation of im|vict has had the effect of 
circunscrtbing the rcope of a classrooct. It fosters an approach to teaching in 
which the teacher worlcs bacla-rard frcea the evaluation procedure? her concept of 
her goals and hor nethods becorao increasingly bound to the content of the evalua- 
tion inntrumonts. If wa need a jargon to describe these contrasting outloolcs, 
vo can term one nouo of evaluation divergent and the other convergent* But most . 
important, the procedure I an rocoramonding places tliu focus of early childhcod 
cdxication where it belongs — on the study of children in school and the dcvolop- 
nont of thoorcticnl conr.tructs for cxplainir.g the influence of their school 
eypvxinncc. 

Keferunces 

Allpurt, G. Personality; a psycaoXogical interpretation. New York: H. Holt & 
Cu., 1937. 

.iiuucrtia. P., biber, B.» Shapiro, E., S, Zimiles, H. The psychoioslcaX impact of 
sctiooi experience . New York: Basic Books, 1969. 



ERIC 



